An observation–action mapping is a simple way of implementing stimulus–response learning. The observation–action mapping records the positive or negative results of previous actions of a software agent or robot based on its observation of the current state of the world, or incoming events. The result of performing the action can then be added into the mapping to help with future decisoons. The choice of action may be a winner makes all approach of choosing the best action based on previous results or a more probabilistic approach, more often taking actions that were previously more positive (or less negative), but occasionally taking less obviously good ones. The choice of this strategy influences the exploration--exploitation trade-off.
Used on page 378